41 research outputs found

    Greedy Algorithm for Set Cover in Context of Knowledge Discovery Problems

    Get PDF
    AbstractIn the paper some problems connected with a process of knowledge discovery are considered. These problems are reduced to the set cover problem. It is known that under a plausible assumption on the class N P the greedy algorithm is close to best approximate polynomial algorithms for the set cover problem solving. Unfortunately, the performance ratio of this algorithm grows almost as natural logarithm on the cardinality of covered set. Instead of usual greedy algorithm we consider greedy algorithm with threshold. This algorithm constructs a partial cover, which covers at least a fixed part (for example, 90%) of the set. We prove that the cardinality of constructed partial cover is bounded from above by a linear function on the minimal cardinality of exact cover Cmin. In the case of 90% -cover, for example, in the capacity of such function we can take the function 2.31,·,Cmin+1. This bound is independent of the cardinality of covered set. Notice that the concept of partial cover in context of knowledge discovery problems is very close to the concept of approximate reduct

    Bounds on Depth of Decision Trees Derived from Decision Rule Systems

    Full text link
    Systems of decision rules and decision trees are widely used as a means for knowledge representation, as classifiers, and as algorithms. They are among the most interpretable models for classifying and representing knowledge. The study of relationships between these two models is an important task of computer science. It is easy to transform a decision tree into a decision rule system. The inverse transformation is a more difficult task. In this paper, we study unimprovable upper and lower bounds on the minimum depth of decision trees derived from decision rule systems depending on the various parameters of these systems

    Comparative Analysis of Deterministic and Nondeterministic Decision Trees for Decision Tables from Closed Classes

    Full text link
    In this paper, we consider classes of decision tables with many-valued decisions closed under operations of removal of columns, changing of decisions, permutation of columns, and duplication of columns. We study relationships among three parameters of these tables: the complexity of a decision table (if we consider the depth of decision trees, then the complexity of a decision table is the number of columns in it), the minimum complexity of a deterministic decision tree, and the minimum complexity of a nondeterministic decision tree. We consider rough classification of functions characterizing relationships and enumerate all possible seven types of the relationships

    Learning Decision Rules from Sets of Decision Trees

    Get PDF
    This paper is devoted to the study of the problems of learning inner and general decision rules that are true for the maximum number of decision trees from a given set. Inner rules correspond to paths in decision trees from the root to terminal nodes. General rules are arbitrary rules that use attributes from the considered decision trees. We propose a polynomial time algorithm for the optimization of inner rules, show that the problem of optimization of general rules is NP-hard, and describe a heuristic for this problem. We compare the considered algorithm and heuristic experimentally on artificially generated datasets and induced from them decision trees with Gini index as a splitting criterion

    Critical properties and complexity measures of read-once Boolean functions

    Get PDF
    In this paper, we define a quasi-order on the set of read-once Boolean functions and show that this is a well-quasi-order. This implies that every parameter measuring complexity of the functions can be characterized by a finite set of minimal subclasses of read-once functions, where this parameter is unbounded. We focus on two parameters related to certificate complexity and characterize each of them in the terminology of minimal classes

    WQO is decidable for factorial languages

    Get PDF
    A language is factorial if it is closed under taking factors, i.e. contiguous subwords. Every factorial language can be described by an antidictionary, i.e. a minimal set of forbidden factors. We show that the problem of deciding whether a factorial language given by a finite antidictionary is well-quasi-ordered under the factor containment relation can be solved in polynomial time. We also discuss possible ways to extend our solution to permutations and graphs

    On Testing Membership to Maximal Consistent Extensions of Information Systems

    Get PDF
    Abstract. This paper provides a new algorithm for testing membership to maximal consistent extensions of information systems. A maximal consistent extension of a given information system includes all objects corresponding to known attribute values which are consistent with all true and realizable rules extracted from the original information system. An algorithm presented here does not involve computing any rules, and has polynomial time complexity. This algorithm is based on a simpler criterion for membership testing than the algorithm described i

    Decision rules derived from optimal decision trees with hypotheses

    Get PDF
    Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees

    Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories

    Get PDF
    Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. H-bonds involving atoms from residues that are close to each other in the main-chain sequence stabilize secondary structure elements. H-bonds between atoms from distant residues stabilize a protein’s tertiary structure. However, H-bonds greatly vary in stability. They form and break while a protein deforms. For instance, the transition of a protein from a nonfunctional to a functional state may require some H-bonds to break and others to form. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor. Other local interactions may reinforce (or weaken) an H-bond. This paper describes inductive learning methods to train a protein-independent probabilistic model of H-bond stability from molecular dynamics (MD) simulation trajectories. The training data describes H-bond occurrences at successive times along these trajectories by the values of attributes called predictors. A trained model is constructed in the form of a regression tree in which each non-leaf node is a Boolean test (split) on a predictor. Each occurrence of an H-bond maps to a path in this tree from the root to a leaf node. Its predicted stability is associated with the leaf node. Experimental results demonstrate that such models can predict H-bond stability quite well. In particular, their performance is roughly 20 % better than that of models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a give
    corecore